- A regression algorithm for high-dimensional data
- Builds the model incrementally like forward selection
- But less greedy and more statistically efficient
- Computes the entire LASSO path with a small modification
- Produces smooth, piecewise-linear coefficient trajectories
This project uses the Burke et al. (2022) global urban soil black carbon dataset, obtained from the Knowledge Network for Biocomplexity (KNB) at: https://knb.ecoinformatics.org/view/urn:uuid:1651eeb1-e050-4c78-8410-ec2389ca2363
The dataset pulls together measurements of black carbon in urban soils from cities around the world. Each row includes details like latitude/longitude, elevation, precipitation, soil temperature at different depths, land-cover type, population info, and notes from the original studies. The main sheet (“Urban Black Carbon”) contains 600+ observations and about 65 variables, giving us a wide mix of environmental and geographic predictors.
Because many of these variables move together (climate, location, soil traits, etc.), the dataset naturally has clusters of correlated features, which makes it a solid fit for demonstrating Least Angle Regression (LARS).